Add CLI sandbox runtime and fix HF Jobs script paths#237
Add CLI sandbox runtime and fix HF Jobs script paths#237
Conversation
Co-authored-by: OpenAI Codex <codex@openai.com>
|
Claude finished @lewtun's task in 2m 4s —— View job ReviewLGTM — no blocking issues, 1 P2. The widening of I traced both call sites:
A bare filename that doesn't exist in the sandbox now produces a sandbox-read error instead of being passed through to remote Tests cover the four shapes that matter (bare filename, nested relative, URL, name-with-args / inline). P2 — tool description out of date
|
Co-authored-by: OpenAI Codex <codex@openai.com>
|
Addressed in Checks:
|
|
Claude finished @lewtun's task in 52s —— View job ReviewLGTM — no blocking issues. The P2 from the prior review is addressed in
Nothing else changed in this commit; the resolution logic in |
|
Claude finished @lewtun's task in 1m 42s —— View job ReviewLGTM — no blocking issues. The new commit since the prior review at The merge brought in two orthogonal edits to PR-touched files from
No conflict-resolution issues in the PR's resolver or tests. Couldn't run the test suite locally (sandbox didn't permit |
Co-authored-by: OpenAI Codex <codex@openai.com>
|
Claude finished @lewtun's task in 2m 17s —— View job ReviewLGTM — no blocking issues. The new commit since the prior review at
The PR's logical resolution code ( Couldn't run the test suite locally (no |
Co-authored-by: OpenAI Codex <codex@openai.com>
|
Claude finished @lewtun's task in 7m 56s —— View job Re-review: PR #237LGTM — no blocking issues. The new commit since the prior review at I traced the new control flow:
No P0 surface I can find. The pre-existing prompt-vs-mode tension (base prompt at Couldn't run the test suite locally (no |
|
Claude finished @lewtun's task in 2m 8s —— View job Re-review: PR #237LGTM — no blocking issues. The new commit since the prior review at
I traced the merge to confirm no interaction with the PR's surface:
Couldn't run the suite locally (no |
Co-authored-by: OpenAI Codex <codex@openai.com>
|
Claude finished @lewtun's task in 55s —— View job Re-review: PR #237LGTM — no blocking issues. The new commit since the prior review at
I spot-checked the docs against the code:
The PR's logical resolution code ( Couldn't run the test suite locally (no |
Co-authored-by: OpenAI Codex <codex@openai.com>
|
Claude finished @lewtun's task in 1m 34s —— View job Re-review: PR #237LGTM — no blocking issues. The new commit since the prior review at I traced the wait against the preload's task lifecycle:
The PR's logical resolution code ( Couldn't run the suite locally (no |
Co-authored-by: OpenAI Codex <codex@openai.com>
|
Claude finished @lewtun's task in 1m 46s —— View job Re-review: PR #237LGTM — no blocking issues. The only new work since the prior LGTM at I traced the merge against the PR's surface:
The PR's logical resolution code ( Couldn't run the suite locally (no |
|
Claude finished @lewtun's task in 1m 13s —— View job Re-review: PR #237LGTM — no blocking issues. The new commit since the prior LGTM at
Neither file is on the PR's logical surface, and no agent/tests/configs touched by this PR were modified by the merge ( The PR's logical resolution code ( Couldn't run the suite locally (no |
Co-authored-by: OpenAI Codex <codex@openai.com>
|
Claude finished @lewtun's task in 1m 33s —— View job Re-review: PR #237LGTM — no blocking issues. The only new commit since the prior LGTM at What changed:
Consistency check against rest of prompt:
The PR's logical resolution code ( Couldn't run the suite locally (no |
|
Claude finished @lewtun's task in 1m 54s —— View job Re-review: PR #237LGTM — no blocking issues. The only new commit since the prior LGTM at
I confirmed the merge didn't touch the PR's logical surface: The PR's logical resolution code ( Couldn't run the suite locally (no |
Co-authored-by: OpenAI Codex <codex@openai.com>
|
Claude finished @lewtun's task in 2m 50s —— View job Re-review: PR #237LGTM — no blocking issues. The only new commit since the prior LGTM at I traced the guard end-to-end:
The PR's logical resolution code ( Couldn't run the suite locally (no |
Co-authored-by: OpenAI Codex <codex@openai.com>
|
Claude finished @lewtun's task in 2m 8s —— View job Re-review: PR #237LGTM — no blocking issues. The only new commit since the prior LGTM at I traced the refactor end-to-end:
The PR's logical resolution code ( Couldn't run the suite locally (no |
|
Claude finished @lewtun's task in 1m 4s —— View job Re-review: PR #237LGTM — no blocking issues. The only new commit since the prior LGTM at
Pure brand/string change. Couldn't run the suite locally (no |
Summary
sandbox_create.bash,read,write, andeditoperate on the local filesystem unless sandbox tools are explicitly enabled.train_smollm2.pyas sandbox script paths when resolvinghf_jobspayloads.hf_jobs.scriptuses inline code, sandbox files, or public/raw URLs, not host-local paths like/fsx/....Why
This came up while testing local model workflows from #228. Small local models can create a training script, then call
hf_jobswithscript: "train_smollm2.py"or a host-local path. Previously bare filenames were not resolved from the sandbox, so the remote HF Job could fail withNo such file or directory.The same testing also showed that CLI local mode intentionally swaps sandbox tools for local filesystem tools, which means
sandbox_createis unavailable by default. This PR makes the runtime explicit and opt-in instead of changing the default behavior.Usage
Default local filesystem tools:
Opt into HF Space sandbox tools:
Or via config:
{ "tool_runtime": "sandbox" }Sandbox mode requires an HF token even when the LLM itself is local, because it creates private HF Spaces.
Behavior
tool_runtime: "local"uses localbash/read/write/editand excludessandbox_create.tool_runtime: "sandbox"uses sandboxbash/read/write/editand includessandbox_create.cpu-basicsandbox and clean it up on shutdown.train_smollm2.py,./train_smollm2.py, and/app/train_smollm2.pyare read and submitted to HF Jobs as script content.https://example.com/train.pyare still passed through as Jobs URLs./Users/...,/home/..., and/fsx/...are explicitly discouraged in the system prompt.Example Failure Fixed
Before:
{"script": "train_smollm2.py"}could fail remotely as:
After this change, the CLI reads
train_smollm2.pyfrom the sandbox and submits the script content, matching the behavior for explicit paths like./train_smollm2.py.Tests
uv run pytest tests/unituv run ruff check .uv run ruff format --check .Follow-up to #228.